Amazon Rekognitionのラベル検出（画像認識）結果をPillowを使って可視化してみる

k.uehara

2022.10.13

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

データアナリティクス事業本部のueharaです。

今回はAmazon Rekognitionのラベル検出（画像認識）をPythonから実行し、得られた結果をPillowパッケージで可視化してみます。

データ準備

S3バケットの作成

まず、今回ラベル検出に用いる画像データを格納するためのバケットを用意します。

例として、「rekognition-test-uehara」というバケットを作成します。

画像の格納

バケットが作成できたら、画像データを格納します。

私の手元では以下の画像を含む、数枚の画像を用意しました。

（出典）

プログラム

S3からオブジェクトの一覧を取得

S3からオブジェクトの一覧を取得するには、Boto3のS3クライアントのlist_objects_v2を利用して取得することができます。

import io
import os
import pprint

import boto3
from PIL import Image, ImageDraw, ImageFont

session = boto3.session.Session()

s3_client = session.client('s3')
res = s3_client.list_objects_v2(
    Bucket='rekognition-test-uehara'
)

pprint.pprint(res) # 確認用

resの中身を見てみると、返却値の'Contents'の中の'Key'でファイル名が取得できることがが分かります。

'Contents': [{'ETag': '"e049348d59199c08c30bcfec972aa604"',
               'Key': 'car.jpg',
               'LastModified': datetime.datetime(2022, 10, 12, 6, 10, 40, tzinfo=tzutc()),
               'Size': 70591,
               'StorageClass': 'STANDARD'},
              {'ETag': '"4f60606f25c432e215aaf2412e76402c"',
               'Key': 'cat.jpg',
               'LastModified': datetime.datetime(2022, 10, 12, 6, 10, 41, tzinfo=tzutc()),
               'Size': 91009,
               'StorageClass': 'STANDARD'},
              {'ETag': '"83371adb2a4fd15d814264d1656f2cd3"',
               'Key': 'horse_human.jpg',
               'LastModified': datetime.datetime(2022, 10, 12, 6, 10, 41, tzinfo=tzutc()),
               'Size': 73241,
               'StorageClass': 'STANDARD'}]

こちらを利用して、S3バケットにある画像を読み込んでいく処理を書きます。

Rekognitionでラベル検出

バケットの中のファイル名が取得できたので、1つ1つRekognitionに渡してやります。

ラベル検出はRekognitionのクライアントのdetect_labelsを利用します。

先程の下に続き、コードは次の通りです。

rekognition = session.client('rekognition')

for r in res['Contents']:
    filename = r['Key']

    labels = rekognition.detect_labels(
        Image={
            'S3Object': {
                'Bucket': 'rekognition-test-uehara',
                'Name': filename
            }
        }
    )

    pprint.pprint(labels) # 確認用

上記のlabelsの中身を見てみると、次のような記述が見つかります。

 'Labels': [{'Confidence': 99.83468627929688,
             'Instances': [{'BoundingBox': {'Height': 0.7377291917800903,
                                            'Left': 0.0,
                                            'Top': 0.2599560022354126,
                                            'Width': 0.7785698175430298},
                            'Confidence': 99.83468627929688}],
             'Name': 'Horse',
             'Parents': [{'Name': 'Mammal'}, {'Name': 'Animal'}]},
            {'Confidence': 99.83468627929688,
             'Instances': [],
             'Name': 'Mammal',
             'Parents': [{'Name': 'Animal'}]},
            {'Confidence': 99.83468627929688,
             'Instances': [],
             'Name': 'Animal',
             'Parents': []},
            {'Confidence': 99.60975646972656,
             'Instances': [{'BoundingBox': {'Height': 0.7024425864219666,
                                            'Left': 0.2761378884315491,
                                            'Top': 0.06409931182861328,
                                            'Width': 0.2846752405166626},
                            'Confidence': 99.60975646972656}],
             'Name': 'Person',
             'Parents': []},
            {'Confidence': 99.60975646972656,
             'Instances': [],
             'Name': 'Human',
             'Parents': []},
            {'Confidence': 99.6041259765625,
             'Instances': [],
             'Name': 'Equestrian',
             'Parents': [{'Name': 'Person'},
                         {'Name': 'Horse'},
                         {'Name': 'Mammal'},
                         {'Name': 'Animal'}]},
            {'Confidence': 79.40290069580078,
             'Instances': [],
             'Name': 'Rodeo',
             'Parents': []},
            {'Confidence': 75.97457122802734,
             'Instances': [],
             'Name': 'Clothing',
             'Parents': []},
            {'Confidence': 75.97457122802734,
             'Instances': [],
             'Name': 'Apparel',
             'Parents': []}],

'Name': 'Horse'や'Name': 'Person'など、何やらRekognitionがラベル検出（画像認識）した結果が返ってきていることが伺えます。

ここで'Confidence'はRekognitionが思う「確からしさ」になります。

あと1つ注目したいのが、'BoundingBox'の部分です。

BoundingBoxには次の4つのプロパティが設定されています。

Height: 画像全体の高さの比率としての境界ボックスの高さ
Left: 画像全体の幅の比率としての境界ボックスの左座標
Top: 画像全体の高さの比率としての境界ボックスの上端座標
Width: 画像全体の幅の比率としての境界ボックスの幅

図示すると次の通りです。

（出典）

こちらから求められる領域が、Rekognitionで検出したオブジェクトの矩形になります。

検出結果の描画

Rekognitionから返ってきた結果のNameとBoundingboxを使えば検出結果を描画できそうなことが分かったので、Pillowを使って可視化したいと思います。

手順としては、次の通りです。

S3に保存されている画像データの読み込み
読み込んだデータにRekognitionの結果を重ねて表示

先程のプログラムに続き、次のように書いてみました。

rekognition = session.client('rekognition')

for r in res['Contents']:
    filename = r['Key']

    labels = rekognition.detect_labels(
        Image={
            'S3Object': {
                'Bucket': 'rekognition-test-uehara',
                'Name': filename
            }
        }
    )

    # S3から画像の読み込み
    s3_resource = session.resource('s3')
    s3_object = s3_resource.Object('rekognition-test-uehara', filename).get()

    stream = io.BytesIO(s3_object['Body'].read())
    image = Image.open(stream)

    # ボックスの描画
    for l in labels['Labels']:
        if not l['Instances']:
            continue

        imgWidth, imgHeight = image.size  
        draw = ImageDraw.Draw(image)

        box = l['Instances'][0]['BoundingBox']
        left = imgWidth * box['Left']
        top = imgHeight * box['Top']
        width = imgWidth * box['Width']
        height = imgHeight * box['Height']

        points = (
            (left,top),
            (left + width, top),
            (left + width, top + height),
            (left , top + height),
            (left, top)
        )
        draw.line(points, fill='#00d400', width=2)
        
        # フォントやサイズ等を設定
　　　　　　　　　　　　　　　txpos = (left, top-25) 
        font = ImageFont.truetype('Arial Unicode.ttf', 20)
        draw.text(txpos, l['Name'], font=font, fill='#00d400')

    image.show()

実行結果

以下のような画像が表示されます。

馬と人間がしっかりと検出できていることが分かります。

全体コード

全体のコードを下記に掲載します。

import io
import os
import pprint

import boto3
from PIL import Image, ImageDraw, ImageFont

session = boto3.session.Session(profile_name='bs-uehara')

s3_client = session.client('s3')
res = s3_client.list_objects_v2(
    Bucket='rekognition-test-uehara'
)

rekognition = session.client('rekognition')

for r in res['Contents']:
    filename = r['Key']

    labels = rekognition.detect_labels(
        Image={
            'S3Object': {
                'Bucket': 'rekognition-test-uehara',
                'Name': filename
            }
        }
    )

    # S3から画像の読み込み
    s3_resource = session.resource('s3')
    s3_object = s3_resource.Object('rekognition-test-uehara', filename).get()

    stream = io.BytesIO(s3_object['Body'].read())
    image = Image.open(stream)

    # ボックスの描画
    for l in labels['Labels']:
        if not l['Instances']:
            continue

        imgWidth, imgHeight = image.size  
        draw = ImageDraw.Draw(image)

        box = l['Instances'][0]['BoundingBox']
        left = imgWidth * box['Left']
        top = imgHeight * box['Top']
        width = imgWidth * box['Width']
        height = imgHeight * box['Height']

        points = (
            (left,top),
            (left + width, top),
            (left + width, top + height),
            (left , top + height),
            (left, top)
        )
        draw.line(points, fill='#00d400', width=2)

        # フォントやサイズ等を設定
        txpos = (left, top-25) 
        font = ImageFont.truetype('Arial Unicode.ttf', 20)
        draw.text(txpos, l['Name'], font=font, fill='#00d400')

    image.show()